R Distribution (rdist) — Symmetric Beta on \([-1, 1]\)#

The R distribution (SciPy: scipy.stats.rdist) is a one-parameter family of symmetric, bounded continuous distributions on \([-1, 1]\).

It is a convenient model whenever:

  • values are naturally constrained to \([-1, 1]\) (correlations, cosines, normalized similarity scores)

  • symmetry around 0 is reasonable

  • you want a single concentration parameter controlling mass near the center vs near the endpoints

A key identity makes it especially approachable:

[ X \sim \mathrm{rdist}(c) \quad\Longleftrightarrow\quad \frac{X+1}{2} \sim \mathrm{Beta}!\left(\frac{c}{2}, \frac{c}{2}\right). ]

That means we can borrow intuition, derivations, and sampling algorithms from the Beta distribution.

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import special, stats

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)

print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
numpy: 1.26.2
scipy: 1.15.0

1) Title & Classification#

Distribution name: rdist (R distribution, symmetric beta)

Type: continuous

Support (canonical):

[ -1 \le x \le 1 ]

Parameter space (canonical):

  • shape parameter: (c > 0)

SciPy parameterization: stats.rdist(c, loc=0, scale=1) uses a location–scale transform

[ Y = \mathrm{loc} + \mathrm{scale},X, \quad \mathrm{scale} > 0, ]

so the support becomes

[ \mathrm{loc} - \mathrm{scale} \le y \le \mathrm{loc} + \mathrm{scale}. ]

2) Intuition & Motivation#

What this distribution models#

rdist models a bounded continuous quantity that is:

  • constrained to ([-1, 1])

  • symmetric about 0

  • optionally U-shaped (more mass near (\pm1)) or bell-shaped (more mass near 0)

Typical real-world use cases#

  • Correlation-like quantities: priors or phenomenological models for correlation coefficients and similarity scores.

  • Cosines and angles: if (\theta) is an angle, (\cos\theta \in [-1,1]). rdist can model cosine-like measurements.

  • Random directions / projections: if (U) is uniform on the unit sphere (S^{d-1}\subset\mathbb{R}^d), then the first coordinate (U_1) has density proportional to ((1-u^2)^{(d-3)/2}), which is rdist with

[ c = d-1. ]

This gives an interpretation of (c) as an “effective dimension”.

Relations to other distributions#

  • Symmetric Beta: ((X+1)/2 \sim \mathrm{Beta}(c/2,c/2)).

  • Special cases:

    • (c=1): arcsine law on ([-1,1]), density (\propto 1/\sqrt{1-x^2})

    • (c=2): uniform on ([-1,1])

    • (c=3): semicircle law, density (\propto \sqrt{1-x^2})

  • Large (c): the distribution concentrates near 0 and is well-approximated by a normal with variance (1/(c+1)) (after matching mean/variance).

3) Formal Definition#

Let (X \sim \mathrm{rdist}(c)) with (c>0). Define (a = c/2).

PDF#

The probability density function is

[ f(x\mid c) = \frac{(1-x^2)^{a-1}}{B!\left(\tfrac12, a\right)} = \frac{(1-x^2)^{\frac{c}{2}-1}}{B!\left(\tfrac12, \tfrac{c}{2}\right)}, \qquad -1\le x\le 1, ]

and (f(x\mid c)=0) for (|x|>1).

Here (B(\cdot,\cdot)) is the Beta function

[ B(p,q) = \int_0^1 t^{p-1}(1-t)^{q-1},dt = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}. ]

CDF#

Using the symmetric-Beta identity, the CDF can be written via the regularized incomplete beta function (I_z):

[ F(x\mid c) = \mathbb{P}(X\le x) = I_{\frac{x+1}{2}}(a,a), \qquad -1\le x\le 1. ]

Equivalently (making symmetry explicit), for (x\in[-1,1]):

[ F(x\mid c) = \tfrac12 + \tfrac12,\mathrm{sign}(x),I_{x^2}!\left(\tfrac12, a\right). ]

def _check_c(c: float) -> float:
    c = float(c)
    if not (np.isfinite(c) and c > 0):
        raise ValueError("Require a finite shape parameter c > 0.")
    return c


def rdist_logpdf(x, c: float):
    '''Log-PDF of the canonical rdist(c) on [-1, 1]. Vectorized over x.'''
    c = _check_c(c)
    a = 0.5 * c

    x = np.asarray(x, dtype=float)
    out = np.full_like(x, -np.inf, dtype=float)

    inside = (x > -1.0) & (x < 1.0)
    xi = x[inside]

    # log f(x|c) = (a-1) * log(1-x^2) - log B(1/2, a)
    out[inside] = (a - 1.0) * np.log1p(-(xi * xi)) - special.betaln(0.5, a)

    # Values at exactly x=±1 do not affect probabilities; we special-case c=2 (uniform).
    edges = np.isclose(np.abs(x), 1.0)
    if np.any(edges) and np.isclose(c, 2.0):
        out[edges] = -np.log(2.0)

    return out


def rdist_pdf(x, c: float):
    return np.exp(rdist_logpdf(x, c))


def rdist_cdf(x, c: float):
    '''CDF of the canonical rdist(c) on [-1, 1]. Vectorized over x.'''
    c = _check_c(c)
    a = 0.5 * c

    x = np.asarray(x, dtype=float)
    y = np.clip((x + 1.0) / 2.0, 0.0, 1.0)
    out = special.betainc(a, a, y)

    out = np.where(x <= -1.0, 0.0, out)
    out = np.where(x >= 1.0, 1.0, out)

    return out


def rdist_rvs_numpy(c: float, size=1, rng=None):
    '''NumPy-only sampler for rdist(c) using Beta(a,a) with a=c/2.

    Uses the identity:
        If G1,G2 ~ Gamma(a,1) iid, then G1/(G1+G2) ~ Beta(a,a).
        Then X = 2*Beta(a,a) - 1.
    '''
    c = _check_c(c)
    a = 0.5 * c

    rng = np.random.default_rng() if rng is None else rng
    g1 = rng.gamma(shape=a, scale=1.0, size=size)
    g2 = rng.gamma(shape=a, scale=1.0, size=size)

    y = g1 / (g1 + g2)
    return 2.0 * y - 1.0

4) Moments & Properties#

Let (X \sim \mathrm{rdist}(c)) and (a=c/2).

Mean, variance, skewness, kurtosis#

By symmetry, all odd moments are 0. In particular:

[ \mathbb{E}[X]=0,\qquad \mathrm{skewness}=0. ]

The second moment (hence variance) is

[ \mathbb{E}[X^2] = \frac{1}{c+1}, \qquad \mathrm{Var}(X)=\frac{1}{c+1}. ]

The fourth moment is

[ \mathbb{E}[X^4] = \frac{3}{(c+1)(c+3)}, ]

which implies excess kurtosis

[ \gamma_2 = \frac{\mathbb{E}[X^4]}{\mathrm{Var}(X)^2} - 3 = -\frac{6}{c+3}. ]

More generally, for (n\ge 0):

[ \mathbb{E}[X^{2n}] = \frac{B!\left(n+\tfrac12, a\right)}{B!\left(\tfrac12, a\right)}, \qquad \mathbb{E}[X^{2n+1}] = 0. ]

MGF and characteristic function#

The moment generating function exists for all real (t):

[ M_X(t) = \mathbb{E}[e^{tX}] = \Gamma!\left(a+\tfrac12\right)\left(\frac{2}{|t|}\right)^{a-\tfrac12} I_{a-\tfrac12}(|t|), \qquad t\ne 0, ]

with (M_X(0)=1). Here (I_\nu) is the modified Bessel function of the first kind.

The characteristic function is

[ \varphi_X(\omega) = \mathbb{E}[e^{i\omega X}] = \Gamma!\left(a+\tfrac12\right)\left(\frac{2}{|\omega|}\right)^{a-\tfrac12} J_{a-\tfrac12}(|\omega|), \qquad \omega\ne 0, ]

with (\varphi_X(0)=1).

Entropy#

Using the Beta connection, the differential entropy is:

[ \mathsf{H}(X) = \log\bigl(2B(a,a)\bigr)

  • 2(a-1),\psi(a)

  • (2a-2),\psi(2a), ]

where (\psi) is the digamma function.

def rdist_mean(c: float) -> float:
    _check_c(c)
    return 0.0


def rdist_var(c):
    c = np.asarray(c, dtype=float)
    if np.any(~np.isfinite(c)) or np.any(c <= 0):
        raise ValueError("Require finite c > 0.")
    return 1.0 / (c + 1.0)


def rdist_excess_kurtosis(c: float) -> float:
    c = _check_c(c)
    return -6.0 / (c + 3.0)


def rdist_even_moment(n: int, c: float) -> float:
    '''Return E[X^(2n)] for X ~ rdist(c) (n >= 0).'''
    c = _check_c(c)
    n = int(n)
    if n < 0:
        raise ValueError("Require n >= 0.")
    a = 0.5 * c
    return float(np.exp(special.betaln(n + 0.5, a) - special.betaln(0.5, a)))


def rdist_mgf(t, c: float):
    '''MGF M_X(t) using the Bessel-I representation (real, even in t).'''
    c = _check_c(c)
    a = 0.5 * c
    nu = a - 0.5

    t = np.asarray(t, dtype=float)
    out = np.ones_like(t, dtype=float)

    mask = t != 0
    abs_t = np.abs(t[mask])
    out[mask] = special.gamma(a + 0.5) * (2.0 / abs_t) ** nu * special.iv(nu, abs_t)

    return out


def rdist_cf(w, c: float):
    '''Characteristic function φ_X(ω) using the Bessel-J representation (real, even in ω).'''
    c = _check_c(c)
    a = 0.5 * c
    nu = a - 0.5

    w = np.asarray(w, dtype=float)
    out = np.ones_like(w, dtype=float)

    mask = w != 0
    abs_w = np.abs(w[mask])
    out[mask] = special.gamma(a + 0.5) * (2.0 / abs_w) ** nu * special.jv(nu, abs_w)

    return out


def rdist_entropy(c: float) -> float:
    '''Differential entropy of rdist(c) on [-1,1].'''
    c = _check_c(c)
    a = 0.5 * c
    return float(
        np.log(2.0)
        + special.betaln(a, a)
        - 2.0 * (a - 1.0) * special.digamma(a)
        + (2.0 * a - 2.0) * special.digamma(2.0 * a)
    )


# Quick numerical cross-check against SciPy
c0 = 5.0
x0 = np.linspace(-0.999, 0.999, 7)

print("max |pdf - scipy|:", np.max(np.abs(rdist_pdf(x0, c0) - stats.rdist.pdf(x0, c0))))
print("max |cdf - scipy|:", np.max(np.abs(rdist_cdf(x0, c0) - stats.rdist.cdf(x0, c0))))

print("mean,var,skew,kurt(excess) (formula):", rdist_mean(c0), rdist_var(c0), 0.0, rdist_excess_kurtosis(c0))
print("mean,var,skew,kurt(excess) (scipy):  ", stats.rdist.stats(c0, moments="mvsk"))

print("entropy (formula):", rdist_entropy(c0))
print("entropy (scipy):  ", stats.rdist.entropy(c0))

# Monte Carlo check
n_mc = 200_000
s = rdist_rvs_numpy(c0, size=n_mc, rng=rng)
print("MC mean/var:", s.mean(), s.var(ddof=0))
print("MGF(t=1) formula vs MC:", float(rdist_mgf(1.0, c0)), float(np.mean(np.exp(1.0 * s))))
max |pdf - scipy|: 2.220446049250313e-16
max |cdf - scipy|: 0.0
mean,var,skew,kurt(excess) (formula): 0.0 0.16666666666666666 0.0 -0.75
mean,var,skew,kurt(excess) (scipy):   (0.0, 0.16666666666666663, 0.0, -0.7499999999999991)
entropy (formula): 0.49334217451750995
entropy (scipy):   0.4933421745175011
MC mean/var: -0.0004876065533899298 0.1668664435284834
MGF(t=1) formula vs MC: 1.0859813581363065 1.0855688443043081

5) Parameter Interpretation#

The single shape parameter (c) controls where the mass lives:

  • (0 < c < 2): (\tfrac{c}{2}-1 < 0) so ((1-x^2)^{\frac{c}{2}-1}) blows up near (x=\pm 1). The distribution is U-shaped.

  • (c = 2): exponent is 0, so the density is constant. This is Uniform((-1,1)).

  • (c > 2): exponent is positive, so the density goes to 0 at (\pm 1) and peaks at 0. The distribution is bell-shaped.

A useful quantitative summary is the variance:

[ \mathrm{Var}(X) = \frac{1}{c+1}. ]

Larger (c) means smaller variance and stronger concentration around 0.

# How the PDF changes with c
cs = [0.5, 1.0, 2.0, 3.0, 10.0]
eps = 1e-4
x = np.linspace(-1 + eps, 1 - eps, 1500)

fig = go.Figure()
for c in cs:
    fig.add_trace(go.Scatter(x=x, y=rdist_pdf(x, c), name=f"c={c}"))

fig.update_layout(
    title="R distribution PDF for different c",
    xaxis_title="x",
    yaxis_title="density",
)
fig.show()

# Variance as a function of c
c_grid = np.linspace(0.2, 20, 200)
fig2 = px.line(
    x=c_grid,
    y=rdist_var(c_grid),
    labels={"x": "c", "y": "Var(X)"},
    title="Variance Var(X)=1/(c+1)",
)
fig2.show()

6) Derivations#

We sketch three core derivations in the canonical ([-1,1]) parameterization.

6.1 Expectation#

The density satisfies (f(x\mid c)=f(-x\mid c)). Therefore the integrand (x f(x\mid c)) is odd, so

[ \mathbb{E}[X] = \int_{-1}^1 x f(x\mid c),dx = 0. ]

6.2 Variance#

Because (\mathbb{E}[X]=0), (\mathrm{Var}(X)=\mathbb{E}[X^2]). Write (a=c/2).

[ \mathbb{E}[X^2] = \frac{1}{B(\tfrac12,a)}\int_{-1}^1 x^2 (1-x^2)^{a-1},dx. ]

Use symmetry and substitute (u=x^2):

[ \begin{aligned} \int_{-1}^1 x^2 (1-x^2)^{a-1},dx &= 2\int_0^1 x^2 (1-x^2)^{a-1},dx \ &= 2\int_0^1 u,(1-u)^{a-1},\frac{1}{2\sqrt{u}},du \ &= \int_0^1 u^{\frac32-1}(1-u)^{a-1},du \ &= B!\left(\tfrac32,a\right). \end{aligned} ]

So

[ \mathbb{E}[X^2] = \frac{B(\tfrac32,a)}{B(\tfrac12,a)} = \frac{1}{2a+1} = \frac{1}{c+1}. ]

6.3 Likelihood (i.i.d. sample)#

Given i.i.d. data (x_1,\dots,x_n\in(-1,1)), the likelihood for (c) is

[ L(c) = \prod_{i=1}^n \frac{(1-x_i^2)^{\frac{c}{2}-1}}{B(\tfrac12,\tfrac{c}{2})}. ]

The log-likelihood is

[ \ell(c) = \left(\frac{c}{2}-1\right)\sum_{i=1}^n \log(1-x_i^2)

  • n,\log B!\left(\tfrac12,\tfrac{c}{2}\right). ]

Differentiating (using (\frac{d}{dz}\log\Gamma(z)=\psi(z))) gives the score equation

[ 0 = \ell’(c) = \frac12\sum_{i=1}^n \log(1-x_i^2)

  • \frac{n}{2}\Bigl(\psi(\tfrac{c}{2}) - \psi(\tfrac{c+1}{2})\Bigr), ]

which is typically solved numerically for the MLE (\hat c).

def rdist_loglik(c: float, x: np.ndarray) -> float:
    c = _check_c(c)
    x = np.asarray(x, dtype=float)
    if np.any((x <= -1.0) | (x >= 1.0)):
        return -np.inf
    return float(np.sum(rdist_logpdf(x, c)))


def rdist_mom_c(x: np.ndarray) -> float:
    '''Method-of-moments estimate from Var(X)=1/(c+1).'''
    x = np.asarray(x, dtype=float)
    v = float(np.var(x, ddof=0))
    if v <= 0:
        return np.inf
    return max(1e-8, 1.0 / v - 1.0)


# Compare MOM vs SciPy's MLE fit on synthetic data
c_true = 8.0
n = 4000
x = stats.rdist.rvs(c_true, size=n, random_state=rng)

c_mom = rdist_mom_c(x)
(c_mle, loc_mle, scale_mle) = stats.rdist.fit(x, floc=0, fscale=1)

print("true c:", c_true)
print("MOM c :", c_mom)
print("MLE c :", c_mle)

print("loglik(c_true):", rdist_loglik(c_true, x))
print("loglik(c_mom): ", rdist_loglik(c_mom, x))
print("loglik(c_mle): ", rdist_loglik(c_mle, x))
true c: 8.0
MOM c : 7.837035002401219
MLE c : 7.830761718750015
loglik(c_true): -1271.0385836930934
loglik(c_mom):  -1270.5214543060163
loglik(c_mle):  -1270.5207357833792

7) Sampling & Simulation#

NumPy-only sampler#

Using ((X+1)/2 \sim \mathrm{Beta}(a,a)) with (a=c/2), we can sample from rdist using only NumPy:

  1. Sample (G_1, G_2 \overset{iid}{\sim} \mathrm{Gamma}(a,1))

  2. Form (B = G_1/(G_1+G_2)) so (B \sim \mathrm{Beta}(a,a))

  3. Return (X = 2B - 1)

This avoids acceptance–rejection and is numerically stable for a wide range of (c).

# Sampling demo
c_demo = 1.5
samples = rdist_rvs_numpy(c_demo, size=10, rng=rng)
print(samples)

# Basic sanity: samples lie in [-1,1]
print("min/max:", samples.min(), samples.max())
[-0.687984 -0.570477  0.755541 -0.147472  0.250969 -0.816589  0.7435
 -0.996741 -0.828464  0.881516]
min/max: -0.9967410604892193 0.8815164217779459

8) Visualization#

We’ll visualize:

  • the PDF for a few (c) values

  • the CDF

  • a Monte Carlo histogram compared to the theoretical density

# PDF and CDF for a chosen c
c_vis = 5.0
x = np.linspace(-1 + 1e-4, 1 - 1e-4, 1500)

fig_pdf = go.Figure()
fig_pdf.add_trace(go.Scatter(x=x, y=rdist_pdf(x, c_vis), name="pdf (formula)"))
fig_pdf.add_trace(go.Scatter(x=x, y=stats.rdist.pdf(x, c_vis), name="pdf (scipy)", line=dict(dash="dash")))
fig_pdf.update_layout(title=f"PDF for c={c_vis}", xaxis_title="x", yaxis_title="density")
fig_pdf.show()

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=x, y=rdist_cdf(x, c_vis), name="cdf (formula)"))
fig_cdf.add_trace(go.Scatter(x=x, y=stats.rdist.cdf(x, c_vis), name="cdf (scipy)", line=dict(dash="dash")))
fig_cdf.update_layout(title=f"CDF for c={c_vis}", xaxis_title="x", yaxis_title="F(x)")
fig_cdf.show()

# Monte Carlo histogram
n_hist = 80_000
s = rdist_rvs_numpy(c_vis, size=n_hist, rng=rng)

fig_hist = go.Figure()
fig_hist.add_trace(
    go.Histogram(
        x=s,
        nbinsx=90,
        histnorm="probability density",
        name="samples (hist)",
        opacity=0.6,
    )
)
fig_hist.add_trace(go.Scatter(x=x, y=rdist_pdf(x, c_vis), name="theoretical pdf", line=dict(color="black")))
fig_hist.update_layout(
    title=f"Monte Carlo samples vs pdf (c={c_vis})",
    xaxis_title="x",
    yaxis_title="density",
    barmode="overlay",
)
fig_hist.show()

9) SciPy Integration#

SciPy provides rdist as scipy.stats.rdist.

  • stats.rdist.pdf(x, c, loc=..., scale=...)

  • stats.rdist.cdf(x, c, loc=..., scale=...)

  • stats.rdist.rvs(c, size=..., random_state=...)

  • stats.rdist.fit(data, ...) (MLE by default)

Because rdist is a symmetric beta distribution, you can also work with stats.beta on ([0,1]) and transform via (x=2y-1).

# Basic SciPy usage
c = 3.0
x = np.linspace(-0.999, 0.999, 9)

print("pdf:", stats.rdist.pdf(x, c))
print("cdf:", stats.rdist.cdf(x, c))
print("rvs:", stats.rdist.rvs(c, size=5, random_state=rng))

# Fit example: recover c when loc/scale are known
c_true = 6.0
data = stats.rdist.rvs(c_true, size=3000, random_state=rng)
(c_hat, loc_hat, scale_hat) = stats.rdist.fit(data, floc=0, fscale=1)
print("true c:", c_true)
print("fit  c:", c_hat)

# Identity check: (X+1)/2 ~ Beta(c/2, c/2)
y = (data + 1) / 2
pdf_beta = stats.beta.pdf((x + 1) / 2, a=c_true / 2, b=c_true / 2) / 2  # Jacobian factor
pdf_r = stats.rdist.pdf(x, c_true)
print("max |pdf_r - transformed_beta_pdf|:", np.max(np.abs(pdf_r - pdf_beta)))
pdf: [0.028463 0.421625 0.551513 0.616446 0.63662  0.616446 0.551513 0.421625
 0.028463]
cdf: [0.000019 0.072463 0.195777 0.342673 0.5      0.657327 0.804223 0.927537
 0.999981]
rvs: [ 0.022809 -0.961042 -0.883764 -0.258481 -0.646669]
true c: 6.0
fit  c: 6.054687500000011
max |pdf_r - transformed_beta_pdf|: 3.3306690738754696e-16

10) Statistical Use Cases#

10.1 Hypothesis testing#

Because rdist includes the uniform as (c=2), a simple test is:

  • (H_0): the data are uniform on ([-1,1]) (i.e. (c=2))

  • (H_1): (c\ne 2)

A likelihood ratio test uses

[ \Lambda = 2\bigl(\ell(\hat c) - \ell(2)\bigr), ]

which is approximately (\chi^2_1) under standard regularity assumptions (use with care in small samples).

10.2 Bayesian modeling#

rdist can be used as a prior over a symmetric bounded parameter, e.g. a correlation-like quantity (\rho\in[-1,1]). Larger (c) yields a prior more concentrated near 0.

10.3 Generative modeling#

If you need to generate bounded features or angles/cosines, rdist is a simple building block.

A particularly neat generative connection:

  • sample a random direction uniformly on the sphere (S^{d-1})

  • the first coordinate has distribution rdist(c=d-1)

So rdist can generate realistic “random projection” coordinates without generating the full vector.

# 10.1 Likelihood ratio test: H0 c=2 (uniform) vs H1 c!=2

def rdist_loglik_scipy(c: float, x: np.ndarray) -> float:
    return float(np.sum(stats.rdist.logpdf(x, c)))


c_true = 5.0
n = 2000
x = stats.rdist.rvs(c_true, size=n, random_state=rng)

c_hat, _, _ = stats.rdist.fit(x, floc=0, fscale=1)

ll_hat = rdist_loglik_scipy(c_hat, x)
ll_null = rdist_loglik_scipy(2.0, x)

lr_stat = 2.0 * (ll_hat - ll_null)
p_value = stats.chi2.sf(lr_stat, df=1)

print("true c:", c_true)
print("MLE  c:", c_hat)
print("LR stat:", lr_stat)
print("approx p-value (chi^2_1):", p_value)
true c: 5.0
MLE  c: 5.0624023437500085
LR stat: 817.0196089693518
approx p-value (chi^2_1): 1.0758209264796379e-179
# 10.2 Bayesian modeling (simple grid posterior)
# Prior: rho ~ rdist(c_prior)
# Likelihood: y | rho ~ Normal(rho, sigma^2)

r = np.linspace(-1 + 1e-4, 1 - 1e-4, 2000)

y_obs = 0.55
sigma = 0.12

fig = go.Figure()

for c_prior in [1.0, 5.0, 20.0]:
    prior = rdist_pdf(r, c_prior)
    like = stats.norm.pdf(y_obs, loc=r, scale=sigma)
    post_unnorm = prior * like
    post = post_unnorm / np.trapz(post_unnorm, r)

    fig.add_trace(go.Scatter(x=r, y=prior, name=f"prior c={c_prior}", line=dict(dash="dash")))
    fig.add_trace(go.Scatter(x=r, y=post, name=f"posterior c={c_prior}"))

fig.update_layout(
    title=f"Grid posterior for rho in [-1,1] (y={y_obs}, sigma={sigma})",
    xaxis_title="rho",
    yaxis_title="density",
)
fig.show()
# 10.3 Generative modeling: coordinate of a random direction on S^{d-1}

d = 10
c_sphere = d - 1

n = 120_000
z = rng.normal(size=(n, d))
z = z / np.linalg.norm(z, axis=1, keepdims=True)
x1 = z[:, 0]

x_grid = np.linspace(-1 + 1e-4, 1 - 1e-4, 1500)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=x1,
        nbinsx=90,
        histnorm="probability density",
        name=f"sphere coord (d={d})",
        opacity=0.6,
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=stats.rdist.pdf(x_grid, c_sphere),
        name=f"rdist(c={c_sphere}) pdf",
        line=dict(color="black"),
    )
)
fig.update_layout(
    title="First coordinate of a random unit vector matches rdist",
    xaxis_title="x1",
    yaxis_title="density",
    barmode="overlay",
)
fig.show()

11) Pitfalls#

  • Invalid parameters: require (c>0). In SciPy, also require scale > 0.

  • Endpoint behavior: for (c<2), the density diverges as (|x|\to 1). This is integrable, but plotting or evaluating exactly at (x=\pm 1) can produce inf.

  • Numerical stability: for large (c) or (|x|) near 1, compute in log-space (use logpdf, betaln, log1p).

  • Fitting: unconstrained fit may try to move loc/scale. If you know data should live on ([-1,1]), use floc=0, fscale=1.

  • Out-of-support data: any (|x|>1) breaks the model; normalize/clip only if you can justify it scientifically.

# Numerical stability: pdf vs logpdf near the boundary for large c
c_big = 50.0
x_edge = np.array([0.0, 0.9, 0.99, 0.999, 0.9999])

pdf_direct = rdist_pdf(x_edge, c_big)
logpdf = rdist_logpdf(x_edge, c_big)

print("x:", x_edge)
print("pdf:", pdf_direct)
print("logpdf:", logpdf)
print("exp(logpdf):", np.exp(logpdf))
x: [0.     0.9    0.99   0.999  0.9999]
pdf: [2.806879 0.       0.       0.       0.      ]
logpdf: [   1.032073  -38.825476  -92.97678  -148.130524 -203.381763]
exp(logpdf): [2.806879 0.       0.       0.       0.      ]

12) Summary#

  • rdist is a continuous, symmetric distribution on ([-1,1]) with shape parameter (c>0).

  • PDF: (f(x\mid c)=(1-x^2)^{c/2-1}/B(1/2,c/2)).

  • CDF: (F(x\mid c)=I_{(x+1)/2}(c/2,c/2)).

  • Mean (0), variance (1/(c+1)), excess kurtosis (-6/(c+3)); odd moments vanish.

  • Sampling is easy via the Beta identity: draw (B\sim\mathrm{Beta}(c/2,c/2)), return (X=2B-1).

  • In SciPy: stats.rdist.pdf, cdf, rvs, and fit cover most workflows.